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Abstract 

In this paper, we study the problem of decomposing a superposition of a low-rank matrix and a 
sparse matrix when a relatively few linear measurements are available. This problem arises in many 
data processing tasks such as aligning multiple images or rectifying regular texture, where the goal 
is to recover a low-rank matrix with a large fraction of corrupted entries in the presence of nonlinear 
domain transformation. We consider a natural convex heuristic to this problem which is a variant to 
the recently proposed Principal Component Pursuit. We prove that under suitable conditions, this 
convex program guarantees to recover the correct low-rank and sparse components despite reduced 
measurements. Our analysis covers both random and deterministic measurement models. 

1 Introduction 

Low-rank matrix recovery and approximation has been a popular area of research in many different 
fields. The popularity of low-rank matrices can be attributed to the fact that they arise in one of 
the most commonly used data models in real applications, namely when very high-dimcnsional data 
samples are assumed to lie approximately on a low-dimensional linear subspacc. This model has been 
successfully employed in various problems such as face recognition [1], system identification [2], and 
information retrieval [3], for instance. 

The most popular tool for low-rank matrix approximation is the Principal Component Analysis (PCA) 
[H [S] . The basic idea of PCA is to find the "best low-rank approximation" (in an ^2-sense) to a given 
input matrix. Essentially, PCA finds a rank-r approximation to a given data matrix D E ^■mxn -^^y 
solving the following problem: 

min \\D — L\\ s.t. rank(L) < r, 

where |j • || denotes the matrix spectral norm. It is well-known that the solution to this problem can 
be easily obtained by computing the Singular Value Decomposition (SVD) of D and retaining only the 
r largest singular values and the corresponding singular vectors. Besides the ease of computation, the 
PCA estimate has been shown to be optimal in the presence of isotropic Gaussian noise. However, the 
biggest drawback of PCA is that it breaks down even when one entry of the matrix is corrupted by an 
error of very large magnitude. Unfortunately, such large- magnitude, non-Gaussian errors often exist in 
real data. For instance, occlusions in images corrupt only a fraction of the pixels in an image, but the 
magnitude of corruption can be quite large. 

There have been many works in the literature that try to make PCA robust to such gross, non- 
Gaussian errors and many models and solutions have been proposed. Wc here consider the specific 
problem of recovering a low-rank matrix Lq G jj'^x" from corrupted observations D ^ Lq + Sq, where 

€ M™x" is a sparse matrix whose non-zero entries may have arbitrary magnitude. This problem has 
been studied in detail recently by various works in the literature [HI [3 [H] • It has been shown that under 
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rather broad conditions, the following convex program succeeds in recovering Lq from D: 

min + AjlSlli s.t. D = L + S, (1) 

where || • ||* denotes the nuclear nor 1 denotes the ^i-nor ni, and A > is a weighting factor. This 

method has been dubbed Principal Component Pursuit (PCP) in [5]. In addition to being computa- 
tionally tractable, it comes with very strong theoretical guarantees of recovery. Furthermore, follow-up 
works have shown that PCP is stable in the presence of additive Gaussian noise [5] and can recover Lq 
even when the corruption matrix Sq is not so sparse [TU] • 

Besides being of theoretical interest, this convex optimization framework for low-rank matrix recovery 
has been employed very successfully to solve real problems in computer vision such as photometric stereo 
However, in practice, much more data, especially imagery data, can be viewed as low-rank only 
after some transformation is applied. For instance, an image of a building facade will become a low-rank 
matrix after the perspective distortion is rectified [T^] or a set of face images of the same person will 
become linearly correlated only after they are proper aligned [13]. With our terminology here, we can 
write as D o T ^ Lq -\- Sq where r belongs to certain transformation group. As the transformation r is 
also unknown, one natural way to recover Lq-, Sq and r together is to approximate the nonlinear equation 
with its linearization at the current estimate of f : 

p 

DoT + ^ J^dr, = L + S, 

i=l 

where {Ji\ is the Jacobian of Do r with respect to the parameters {t,} of r. Then one can incrementally 
update the estimate for r with f + dr by solving the following convex program: 

p 

min ||L||, + A||51|i s.t. D + ^Jdn^ L + S. (2) 

L,S,dT, 

Empirically this scheme has been shown to work rather effectively in practice in both the image rectifi- 
cation problem |12| and the image alignment problem jl3) . 

Although the convex program was proposed in the same spirit as PCP, we note that the linear 
constraint is different, and hence, the theoretical guarantees for PCP shown in [B] [3 do not directly 
apply to this case. In this work, wc attempt to fill the gap between theory and practice and try to 
understand under what conditions, the above extended version of PCP is expected to work correctly. 

Let Q be the linear subspacc in that is the orthogonal complement to the span of all the J^'s, 

then its dimension is q — mn — p. Clearly, we can rewrite the above program in the following form: 

min ||L|U + A||5||i s.t. VqD^Vq{L + S), (3) 

L 

where Vq is the orthogonal projection onto the linear subspace Q. Clearly, this program is a variation 
to PCP ([1]) in which the number of linear constraints has been reduced from mn to q = mn — p. Indeed, 
if Q is the entire space, then it reduces to the PCP. If Q is a linear subspace of matrices with support in 
fl C [to] X [n], then we have the special case of recovering Lq from Z?, when only a subset of the entries in 
D are available. This case is akin to the low-rank matrix completion problem [TU [TSl [11] , and theoretical 
guarantees have been derived in (6j[T7]. However, to the best of our knowledge, the case with a general 
subspace Q has not yet been analyzed in detail in the literature. 

Our motivation to study when the convex program ([3]) succeeds with such reduced linear constraints 
is at least twofold. First, the relationships between Q and Lq and So will provide us better understanding 
about what type of images and signals for which techniques such as those used in [T^ [T3] are expected 
to work well. Second, we want to know how many general linear measurements we could reduce without 

^The sum of all singular values. 

■^The sum of absolute values of all matrix entries. 
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sacrificing the robustness of PCP for recovering the low-rank matrix Lq. In these apphcations, the 
number of constraints reduced corresponds to the dimension of the transformation group. In the image 
rectification problem, the dimension of the transformation group p is typically fixed with respect to the 
size of the matrix; in the image alignment problem, however, the dimension typically grows linearly in 
m (or n). In either case, we need to know if the program @ can tolerate up to a constant fraction of 
gross errors. 

1.1 Notation 

We first establish a set of notations that will be used throughout this work. We will assume that the 
matrices Lq, Sq and D in ^ have size m x n. Without any loss of generality, we assume that n < m. 
We denote the rank of Lq by r. Let Lq = UJ^V* be the reduced Singular Value Decomposition (SVD) 
of Lq. Wc define a linear subspace T as follows: 

T = {C/X* + yy* :XeR"^'',FGR"^''}. (4) 

Basically, T contains all matrices that share a common row space or column space with Lq. We denote 
by ri the support of Sq. By a slight abuse of notation, we also represent by n the subspace of matrices 
whose support is contained in the support of 5*0. For any subspace S C R™'^", Vs ■ R™^" — )- 
denotes the orthogonal projection operator onto S. 

For any X,Y e R"^", we define their inner product as {X,Y) = trace{X*Y) = Eij^y^ij- We let 
II • \\f and II • II denote the matrix Frobenius norm and spectral norm, respectively. We also denote the 
.^oo-norm of a matrix X as ||^||oo = maxy |-^y"|. We say that an event E occurs with high probability if 
F[E'^] < C m~", for some positive numerical constants C and a. Here, E'^ denotes the event complement 
to E. 



1.2 Main Assumptions 

Obviously, successful recovery is not always guaranteed except under proper assumptions on the low- 
rank Lq, sparse 5*0, and the subspace Q involved. For instance, if the matrix Lq is itself a sparse matrix, 
then there is a fundamental ambiguity in the solution to be recovered. Here, we outline some of our 
assumptions that wc will use throughout this paper. The assumptions we make here on Lq and 5*0 are 
essentially the same as those for PCP [5]. For completeness, we list them below. 

We assume that each entry of the matrix belongs to the support of the sparse matrix Sq independently 
with probability p. Wc denote this as supp(S'o) ~ Ber(p). For simplicity, we assume the signs of the 
nonzero entries are also randomly] For the low-rank matrix Lq, wc assume the subspace T defined inQ 
is incoherent to the standard basis (and hence the sparse matrix Sq). To be precise, let us denote the 
standard basis in R'" and R" by and Cj, respectively, where i e [m] and j e [n]. Wc assume (as in 
[H]) that 

max||C/*eJ^ < — , max ||y*eJ^ < — , \\UV*\\oo<\—, (5) 

for some n > and for all S [m] x [n]. We recall that r = rank(Lo)- It follows from the above 

assumptions that for any (i, j) G [to] x [n] 

\\VTe.e*y<^. (6) 

Furthermore, it can be shown that \\Vt±X\\ < ||X|| for any X e M"''". 

In addition to the above assumptions, we define the following two properties of linear subspaces. We 
say that a linear subspace S C ]]j™x" is 

^The random sign assumption is not entirely necessary for obtaining the same qualitative results. One can follow the 
derandomization process in |6] to remove this assumption if needed. 
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• v-coherent if there exists an orthonormal basis {G,} for 5* satisfying 



max||Gj2< _. 



(7) 



• ^-constrained if 



max llT'seje*!!^ < 7. 

7 1 



(8) 



If 5 is a random subspace, we say it is i^-coherent and 7-constrained if Eqns. ([7]) and ([5]) hold with high 
probabihty, respectively. 

In this paper, we will deal with two different assumptions on the subspace Q as outlined below. We 
will see later that it is in fact convenient to make our assumptions on the subspace Q"*", rather than on 
Q itself. This is partly motivated from the model in ^ that was used in [T^ [T3] , where the Ji 's are 
essentially a basis for . So, any assumptions on Q-^ can be easily interpreted in terms of the J^'s 
and this would help us make the connection to these applications more directly. We denote by p the 
dimension of the subspace Q^. 

• Random subspace model. Let Gi,G2, ■ ■ ■ ,Gp G R"'X" be an orthonormal basis for Q^. We 
assume that this basis set is chosen uniformly at random from all possible orthobasis sets of size 
p in M™^". It can be shown that each of the G^'s are identical in distribution to H/\\H\\p, where 
the entries of i7 G jgnixn g^j.^ i.i.d. according to a Gaussian distribution with mean and variance 



• Deterministic subspace model. Under this model, we assume that Q is a fixed subspace 
which is i^-coherent, for some v > 1. 

1.3 Main Results 

With the above notation, we now briefly describe the main results we prove in this work. Although 
our results and proof methodology resemble those in |^, there are some important differences here. 
Particularly, we will see that the assumptions we make on the subspace Q greatly influences the kind of 
guarantees for recovery that can be derived. 

As mentioned earlier, we will consider two different assumptions on the subspace Q. In the first one, 
we assume a random subspace model for . The main result that we prove in this work under this 
random subspace model is summarized as the following theorem. 

Theorem 1 (Random Reduction). Fix any Cp > 0, and let be a p-dimensional random subspace 
of M™^" {n < m), Lq a rank-r, /i-incoherent matrix, and supp(S'o) ^ Ber(p). Then, provided that 



Po € (0, 1) are numerical constants. 

Remark 1. In Theorem[TJ "with high probability" means with probability at least 1 — /3(Gp)m^'^, with 
c > numerical. 

The scaling in this result covers several applications of interest: in [T^, p is a fixed constant, while in 
|13| . p scales linearly with n. Therefore, the above result already covers both these applications in terms 
of the number of reduced constraints. It states that with such reduced constraints, the convex program 
^ can recover the low-rank matrix Lq essentially under the same conditions as PCP. In particular, it 
can tolerate up to a constant fraction of errors. 

In a work that is closely related to this one |18| . we have shown that one can expect the convex 
program ([3]) to work under much more highly compressive scenario. More precisely, the dimension of 
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the subspace Q only needs to be on the order of (mr + k) log^ m which is only a polylogarithmic factor 
more than the intrinsic degrees of freedom of the unknown Lq and Sq. One nice feature about the work 
of |18) is that the proof framework is very modular and the techniques are even applicable to more 
general structured signals beyond low-rank and sparse ones. Nevertheless, that result does not subsume 
the result here because in such highly compressive scenario, we cannot expect to tolerate error up to a 
constant fraction of the matrix entries. Obtaining the results in Theorem [T] and Theorem [3] seems to 
require arguments that are specially tailored to the POP problem. 

There is a common limitation for all results that are based on a random assumption for Q or Q-^ : the 
random assumption does not hold in many real applications. For instance, in [T2J[T3], the subspace 
is typically spanned by a set of image Jacobians, which may not behave like random matrices. Therefore, 
it is desirable to have deterministic conditions on Q-^ (or Q) that can be verified for the given data. 
We need theoretical guarantees for recovery when is a deterministic subspace. This is the second 
scenario that we will consider in this work, for which we have the following result: 

Theorem 2 (Deterministic Reduction). Fix any p e Z+, a > 1, and v > 1. Then there exists 
Cr > such that if is a z^-coherent p-dimensional subspace of R™^" (n < m < an), Lo is a rank-r, /i- 
incoherent matrix, and supp(S'o) '■^ Bcr(p), with high probability (Lo, 'S'o) is the unique optimal solution 
to ^ with A = TO^^/^, provided that 

f/ n \'/' / n n 1 

r<amin<^ ^-^ , ,— }, p < po, (10) 

yi^^p^Q! J \avpp J p login 

where Cr,Pa G (0, 1) are numerical constants. 

Remark 2. Here, "with high probability" means with probability at least 1 — I3{p, a, i/)m~^, with c > 
numerical. 

The i/-coherence condition essentially requires there exists an orthonormal basis for Q-^ whose spectral 
norms are bounded above by 0(n~^/^). This is a condition that can be verified directly once the subspace 
Q or Q-^ is given (say as the span of the Jacobians). This condition is also significantly weaker than the 
random subspace assumption in Theorem [TJ 

Because the assumptions are weaker, the orders of growth in Theorem [21 quite a bit more restrictive 
than those in Theorem [TJ Nevertheless, this result can be very useful for the practical problems that 
we encountered in image rectification where the dimension of the transformation group is typically fixed 
(i.e. does not change with the matrix dimension). Theorem [5] suggests we should expect the program to 
work at least for deformation groups whose dimension is fixed. Although empirical results suggest that 
it could even grow as 0(n), we leave that for future investigation. 

The remainder of this paper is organized as follov^fs: In Section [21 we derive the optimality 
conditions for (Lq, Sq) to be the optimal solution to the convex program ((3]). In particular, we derive the 
conditions that a certain dual certificate must satisfy that would establish our main result. In Section 
[31 we provide a constructive procedure for the aforementioned dual certificate. In Sectional we describe 
our main assumptions and the detailed steps of the proof of Theorem [Jl In Section [5l we outline the 
proof of Theorem [21 Although the proof for both the deterministic case will follow a common strategy as 
the random case, there are a few important differences. In particular, we will highlight the parts where 
the proof deviates significantly from that of Theorem [H 

2 Existence of Dual Certificate 

In this section, we prove the following lemma that establishes necessary and sufficient conditions for 
{Lq, So) to be the optimal solution to ([3]). 
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Lemma 1. Assume that dim{Q-^ (BT (Bil) = dim((5-'-) +diin(T) + dim(r2). {Lq, So) is the unique optimal 
solution to © if there exists a pair (W, F) G K™^" x R"^" satisfying 

UV* + W ^ X{sgn{So) + F) eQ, (11) 

with T'tW^ = 0, II 1^11 < l,rnF = 0, and ||F||oo < 1- 

Proof. Consider a feasible solution to ^ of the form {Lq + Hi, So — Hs). Clearly, we have that VqHl = 
VqHs- Under the conditions mentioned in the lemma, we will show that this pair does not minimize 
the cost function in ([31), unless = Hs = 0. 

We first use the fact that || • i|* and || • ||i are convex functions. Consider any pair (VFo,-Fo) G 
j^mxn X j^mxn satisfying VtWo = 0, llWoll < 1, VnFo = 0, and ||Fo||oo < 1- Then, UV* + Wois & 
subgradient to || • |1* at Lq, and sgn(S'o) + Fq is a subgradient to || ■ ||i at So- Therefore, 

IlLo + Hl\U + \\\So - HsWi > ilLoil* + All^olli + {UV* + Wo, Hl) - A(sgn(5o) + Fq, Hs). 

By Holder's inequality (and the duality of norms), it is possible to choose Wo and Fo such that 

{Wo,Hl) = \\Vt^Hl\U, {Fo,Hs) = -WVn^HsWi. 

Then, we have 

11^0 + HlW. + M\So - HsWi > \\Lo\U + M\So\\i + {UV\Hl) - A(sgn(S'o), i/s) 

+ \\VT^HLh + X\\Vn^Hs\\i. 

By assumption, we have 

UV* = A(sgn(S'o) + F)~W, 
with A(sgn(S'o) + F) G Q. Substituting for UV* and using VqHl = VqHs, we get 

{UV*,Hl) - A(sgn(5o), Hs) + A(F, Hs) - {W, Hl). 
Substituting this in the above inequality, we get 

11^0 + Hl\\. + \\\So - HsWi > \\Lo\U + M\So\\i + WVt^HlIU + Ml^n^Hsh 

+X{F,Hs) - {W,Hl). 

Let (3 = max{|lM^|l, |lFj|oo} < 1- Using Holder's inequality, we get 

\\Lo + Hl\U + X\\So-Hs\\i > ||Lo|U + A||5oi|i + (l-/3)l|7'T^(i?L)lU 

+ {l~[3)X\\VnAHs)\\i. 

For non-zero Hl,Hs, the last term on the right hand side above can be zero only if Hl G F\{0} and 
Hs G 0\{0}. Since fl D T = {0}, Hl ^ Hs. We also have VqiHL - Hs) = 0. This imphes that 
Hl — Hs G Q'^, which is a contradiction since n (T ® SI) = {0}. Thus, we have 

\\Lo + Hl\U + M\So - HsWi > IlLoll* + All^olli, 

for any non-zero feasible perturbation {Hl, Hs). □ 

It is often convenient to relax the equality constraints on the dual certificate given in pip . Thus, 
similar to the proof outline in [6j[T6], we now provide a slightly relaxed dual certificate condition. 

Fact 1. Let 5*1 and S2 be two linear subspaces in R"^" with C Then, for any X G M™''", we 
have VsiX = Vs^Vs^X, and consequently, HPs^Xjli;- < WVs^XWf- 



6 



Lemma 2. Suppose that dim{Q-^ ®T®ft) = dim{Q^) + dini(r) + dim(f2). Let T = Q n so that 
= T. Assume that WVuVr^ \\ < 1/2 and A < 1. Then, {Lq, Sq) is the unique optimal solution 
to © if there exists a pair iW,F) £ M™^" x R"^" satisfying 

UV* + W = X{sgn{So) + F + VnD)eQ, (12) 

with VtW = 0, \\W\\ < 1/2, VnF = 0, ||F||oo < 1/2, and \\VnD\\F < 1/4. 

Proof. Proceeding along the same lines as in the proof of Lemma [1] for any feasible perturbation 
{Hl,Hs), we get 

\\Lo + Hl\\, + \\\So-Hs\\i > \\Loh + X\\So\\i + ^\\VT^HL\U 

+ ^\\Vn^Hs\\i + X{rnD,Hs) 
> WLoIU + XWSoWi + ^WVt^HlW* 
+ ^\\rn±Hs\\i-^\\rnHs\\F- 

We note that 

\\VnHs\\F < \\VnVrHs\\F + \\VnVr±Hs\\F 

< \\VnVrHL\\F + ^\\Hs\\F 

< \\VrHL\\F + ^WVuHsWf + ^\\Vn±Hs\\F 

< \\Vt±Hl\\f + ^WVuHsWf + ^\\Vn^Hs\\F- 

In the second step above, we have used the fact that VtHl = VrHs (since F C Q), and the final 
inequality follows from Fact [TJ Thus, we have 

WVuHsWf < 2\\Vt±Hl\\f + \\Vn^Hs\\F < 2\\Vt±Hl\\* + \\Vn^Hs\\i. 
Putting it all together, we get 

\\Lo + Hl\U + HSo - Hsh > !|ioil* + Ai|5o||i + ^-j^\\Vt^Hl\\. + ^lln^^^slli- 
The desired resuh follows from the fact that D [T Q) n) = {0}. □ 



3 Proof Strategy 

By Lemma [21 in order for us to prove either Theorem 1 or 2, it is sufficient to produce a dual certificate 

W ^]gmxn satisfying 

Vq^W = -Vg^iUV*), 

< ||l^||<l/2, (13) 

WPniUV* - Asgn(5o) + W)\\f < A/4, 
^ \\ra^iUV* + W)U<X/2. 

To prove Theorems 1 and 2 under the above conditions, we try to construct the dual certificate W 
by following a similar strategy as that in the original PCP 0. However, the extra projection of the 
observations onto the subspace Q adds significant difficulty to various technical parts of the proof. In 
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this section, we will outline the basic components for constructing such a certificate and then provide 
detailed proofs for each of the component in next sections. For simplicity, throughout our discussion 
below, we set r = Q n so that Q-^ © T. 

As the support of the sparse matrix is distributed as £7 ^ Ber(p) for some small p G (0, 1). This is, 
of course, equivalent to assuming that fi'^ ~ Ber(l — p). Suppose that fli, ^2, • • • , ^Ijo arc independent 
support sets such that ilj ^ Ber(g) for all j. Then, fi'^ and UjLi have the same probability distribution 
if p = (1 — qy". We now propose a construction for the dual certificate W = + + as follows. 
We use a combination of the golfing scheme proposed in jl6| and the least norm approach. 

1. Construction ofW^ using the golfing scheme. Starting with Yq = 0, we iteratively define 

Yj = + q-'Vn,Vr^{UV* - (14) 

and set 

W^^-Wjo, (15) 

where jo = [2 log m] . 

2. Construction of by least norm solution. Wc define by the following least norm problem: 

= argminjf \\X\\f 
subj. to VnX = Asgn(S'o) (16) 
Vr^X = 0. 

3. Construction of W'^ by least squares. We define by the following least squares problem: 

W'^ = argminx ||-^||f 
subj. to Vq^X = -VQ^iUV*) (17) 
VuX = 0, 

where 11 = ® T. 

We note that under our assumptions (see Section II. 2p , both the least squares programs above are 
feasible with high probability under both the random subspace model and the deterministic subspace 
model. This is because we will later show that the spectral norms of the linear operators V^Vy^ and 
Vq^Vu can be bounded below unity with high probability. 

Thus, to prove that + W'^ + is a valid dual certificate, we have to establish the following: 

||iy^ + iy^ + iy'3|| < 1/2, (18) 

||Pn(t/F* + M^^)||F < A/4, (19) 
\\Vn^{UV* + ^W'^ + W^)\\oo < A/2. (20) 

Lemma 3. Assume that ^l ^ Bcr(p) for some small p G (0, 1) and the assumptions ([5]) and ([7]) hold 
true. Then, the matrix obeys, with high probability, 

1. IIM^^II < 1/4, 

2. WVniUV* +W^)\\f < A/4, 

3. \\Vn±{UV* + W'^)\\oo<\/i. 

Lemma 4. In addition to the assumptions in the previous lemma, assume that the signs of the non-zero 
entries of Sq are i.i.d. random. Then, the matrix obeys, with high probability, 

1. llM/^li < 1/8, 
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2. WVn^W^Woo < A/8. 

Lemma 5. Assume that fl ~ Ber(p) for some small p G (0,1) and the assumptions ([5]) and ([7]) hold 
true. Then, the matrix W'^ obeys, with high probability, 

1. \\wQ\\ < 1/8, 

2. \\rn^W'^\\^<X/S. 

The above lemmas together establish a valid dual certificate that satisfies Eqn. ([T5|) to Eqn. (|20p . 

4 Random Reduction: Proof of Theorem [1] 

In this section, we provide a detailed proof of Lemmas |31 HI and [5] for the case when Q is a random 
subspace. Before proceeding to the main steps of the proof, we first establish some important properties 
and relationships among the different quantities involved in the problem. 

4.1 Preliminaries 

Lemma 6. Let be a linear subspace distributed according to the random subspace model described 
earlier. Then, for any € [m] x [n], with high probability, 

Pg^e.e -llF < 4 W . (21) 

V mn 



Proof. For any S [to] x [n], we have 



A ^\{Gk,e^e*)\'^ < ^max \\Gk 



(22) 



We now derive a bound for HGfeHoo- Suppose that M £ ]R™x" jg a random matrix whose entries are i.i.d. 
according to the standard normal distribution. Let us define 



H 



1 



and G = H/\\H\\f- Clearly, G is identical in distribution to Gi,G2, ■ ■ ■ ,Gp. We know that, for any 
(ij) € [to] X [n], 

\M,,\>t]< ' 



TT t 



Therefore, using a union bound, we get 



/ 2 mn f2 /„ 



or equivalently. 



liflloo > 



-K t ' 



Now, if we have p random matrices i/i, if2 7 • ■ • , ^^p, independent and identical in distribution to i7, then 



max ||i/fc||oo > —j= 
k Jmn 



< ,/2 !!^g-tV2. 
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Setting t = ^4 log(mnp), we get 



max \\Hk\\oo > 

k 



Thus, with high probabihty, we have that 



A\og{'mnp) 



mn 



< 



mnpy^\og{mnp) 



max \\Hk\ 



< 



4 log{mnp) 



It can be shown that ||_fffe||i? > 1/2 with high probabihty. Thus, we have that 



max IIGfelloo 

k 



< 



16 log(mnp) 



with high probabihty. The desired rcsuh foUows from Eqn. (|22 



□ 



Lemma 7. Assume that p < mn/A. Let be a hnear subspace distributed according to the random 
subspace model. Then, with high probabihty, we have 



\'Pq^Vt\\ < 



y/p+ ^J{m + n)r 



(23) 



Proof. Firstly, we note that is identical in distribution to a subspace spanned by p independent 
random matrices, each of whose entries are i.i.d. according to a Gaussian distribution with mean zero 
and variance 1/mn. Let H '.SJ' ^ j^mxn ^ linear operator defined as follows: 

p 
fe=i 

where the iJfc's are independent random matrices each of whose entries are i.i.d. according to a Gaussian 
distribution with mean zero and variance 1/mn. Then, we have that Vq± has the same distribution as 
the operator 'H{n*'Hy^n* . Therefore, we have 



\Vq^Vt\\ > 8 



< 



< 




\H{H*H)-^H*Vt\\ > 8 J — 



nn*n)-'\\\\n*VT\\ > s (\f^+ J^-HH^ 

\ y mn V mn 

[\\H{H*H)-^\\ > 4] + 



\ V nin V mn 



Suppose that R e ^mnxp ^ random matrix whose entries are i.i.d. according to a Gaussian 
distribution with mean zero and variance 1/mn. It is easy to see that if we vectorize all the matrices, 
then R is the matrix analogue of the operator T-L. Therefore, || has the same distribution 

as (crmin(-R))^^. Let R' = yJmnR. Clearly, the entries of R' are i.i.d according to the standard normal 
distribution. Using the concentration results for 1-Lipschitz functions (see Proposition 2.18 in [19]) and 
the distribution of singular values of random Gaussian matrices |20j , it is possible to show that 



m < 



10 



for any t > 0. Consequently, we have that 



CTmi„(i?) <l-\ —-t 



< e 



Setting t = 1/4 and by our assumption that p < mn/A, we get 



CTmin(i?) < 2 



= p [\\nin*n)-^\\ > 4] < e-™"/32. 



We now note that ||?^*Pt|| = II^t^H is identical in distribution to ||M||, where M € Ri"^+^>>^P is a 
random matrix whose entries are i.i.d. Af{0, 1/mn). This is because the isotropic Gaussian distribution 
is rotation- invariant. Hence, without any loss of generality we can assume that the operator Vt preserves 
only the first dim(r) = (to + n)r components of the basis elements iJi, . . . ,Hp. Once again, invoking 
Proposition 2.18 in [12], we can show that 



Setting t = max | ^pjran , ^ (to + n)r jmn^ , it 



follows that 



|M|1 > 2 



^ + ^(to + 7i)r 



\U*Vt\\ > 2 



^+ ^(m + n)r 



< min|e-P/2^e"(™+ 



?i)r/2| 



Putting it all together, we get 



\Vq^Vt\\ > 



p ^ / (to + n)r 
mn V mn 



< e 



r/2| 



Thus, we have that 



IVq^VtW < 8 



^ + ^(to + 7i)r 



with high probability. 



□ 



Lemma 8. Let Q-'- be a linear subspace distributed according to the random subspace model and 
~ Ber(/9). Then, with high probability, we have 



P 



I bp 



(24) 



Proof. Proceeding along the same lines of the proof of the previous lemma and conditioned on 51, we get 



< —pmn 



< e 
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Using Bernstein's inequality, it is possible to show that 

' mnpS^ 



P [\n\ > mnp{l + 5)]<2 exp 
for any 5 G (0, 1). We set 5 — 1/4. Thus, we have 
\VQ^Vn\\ > 8 



1-p+f , 



< 2 exp ( ——mnp5^ 



< P 

< e" 
Thus, we have that 



P_ 

mn 

P_ 

mn 



Q\ < —pmn 



'Pq^'PuW > 8 

mn/32 ^ |g-p/2 g-5mnp/8| _|_ 2 g-3mnp/80 



with high probability. 

Lemma 9. Let ^ Ber(p). Then, with high probability, 

\\VnVTf<p + e, 

provided that 1 — p > Coe~^ pr-iogm g^j^^-^^^ numerical constant Cq > 0. 
Proof. See Corollary 2.7 in [6]. 



|rj| > —pmn 



□ 



(25) 



□ 



We now prove the following two results that would help us establish incoherence relations with 
subspaces obtained by a direct sum of two incoherent subspaces. 



Lemma 10. Let 5*1 and 5*2 be any two linear subspaces in 
define S = Si®S2. Then, for any X G M"''", we have 



satisfying H'Psi'Psall < " < 1- We 



\VsX\\l < (1 - a)-\\\Vs,X\\l + \\Vs,X\\l). 



(26) 



Proof. We denote by vec : R"'^" — ^ M"™, the operation of converting a matrix to a vector by stacking 
its columns one below another. Suppose that di and d2 are the dimensions of the subspaces Si and 
5*2, respectively. Then, there exist matrices Bi € and B2 € ^'""xt'a -vvhose columns constitute 

orthonormal bases for 6*1 and S2 , respectively. 

Let M = [Bi B2]. Clearly, the columns of M constitute a basis for the subspace S in R"™. Hence, 
for any X G R"*^", its projection onto S can be expressed as follows: 

vec(Ps^) = M{M*M)-Hrvcc{X). 

We note that |lBJ'vec(X)||2 = WVs^XWf and ||B^vec(X)||2 = WVs^XWf- Thcrefore,we have 

WVsXfp = \\ycc{rsX)f2 

= \\M{M*My^M*vec{X)\\l 

< \\MiM*M)-^f ■\\M*Yec{X)\\l 

= ||M(M*M)-if • {\\rs,X\\l + \\Vs,X\\l) 
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Let = (M*M)-iM* denote the Moore-Penrose pseudoinverse of M. It is evident that ||M1'|| = 
||M(M*M)-i||. But we know that IIM+II = (crmi„(M))-\ where a„,in{M) is the smallest non-zero 
singular value of M . Using the fact that Bi and B2 have orthonormal columns, we can show that 
{a^in{M)f = Amin(Af*Af) > 1 - a, where \^i^{M*M) is the smallest eigenvalue of M*M0 Therefore, 



we have 



WVsXrF < (^mi„(M))-2(||7'5.^||| + rs,X|||,) 

< {l-a)-\\\Vs,xrp + \\Vs.X\\l). 



□ 



Suppose that WVq^VtW < 1/20 Then, it follows that 



\\Vr±e,eA\p <4:l \ , 27) 

■' \ mn n J 

with high probability, for all € [m] x [n]. In other words, with high probability, when is 

distributed according to the random subspace model, we have that the subspace is 7-constrained 
with 7 = 4 '"mra""^'' + if)- further note that 7logm — 0(1/ log m) under the conditions of 
Theorem [1] This fact will be used frequently in our proof below. 

Lemma 11. Let Si, S2 and ^3 be any three linear subspaccs in M™^" satisfying dim(S'i © S'2 © 5*3) = 
dim(5i) + dim(52) + dim(53), and WVs^VsA < ai,2 < 1, WVs.VsA < ^2,3 < 1 and WVs.VsAl < "3,1 < 1- 
We define S = Si® 82- Then, we have 




\\rsrs,\\<\^ — ^. (28) 



Proof. The proof is a simple application of Lemma [TUl We note that, for any X E i 
WVsVs.Xfp < {l-ai,2r\\\rs,rs,X\\% + \\Vs,Vs,X\\% 

< (1 - ai.2)-\\\rs,rs.,r + \\rs.rsA')\\xrp 

< {l-ai,2)-\ali + al,)\\X\\l. 



It follows that 




1 — Oil 



2 



□ 

Lemma 12. Let ~ Ber(p) and be 7-constrained. Then, for any e € (0, 1), with high probability, 

\\Vt^ - P^^VT^VnVvi- II < e, (29) 
provided that p> C ■ e^^-flogm for some numerical constant C > 0. 

Proof. The proof is very similar to that of Theorem 4.1 in |14j . We highlight the main steps here. For 
each G [m] x [n], we define binary random variables (5^, each takes value 1 if (i,j) G il, and 

otherwise. We note that 



^ Since M has full column rank, M* M is positive definite. 

^From Lemma [7] and the assumptions of Theorcm[l] this is true with high probability for sufficiently large m, n. 
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where ig) denotes the outer or tensor product between matrices. Applying a concentration result for 
operators of the above form, as established in [21], we have, with high probability. 



^, /log(TOn) 



Vr^- p-'Vr^VnVrA\ < C'J^ ^ max ||7'r^e,e*||j^ (30) 



<CyiM!^, (31) 

provided that the right hand side is smaller than 1. Here, C" > is a numerical constant. The desired 

result follows by noting that n < m, and bounding the right hand side by e € (0, 1). □ 

Lemma 13. Let Z E be fixed, P-'- be 7-constrained, and il ~ Ber(p). Then, with high probability, 

\\Z - p-'Vr^VnZ\\oo < 4Z\\oo, (32) 
provided that p > Cq ■ e~^j\ogm for some numerical constant Cq > 64/3. 
Proof. Let 6ij be a sequence of independent Bernoulli random variables such that 



1, 

0, otherwise. 



We define Z' = Z - p-^Vr±VnZ. Then, 

Z' = ^(1 - p-'S,,)Z,,rr^e,e*. 

For any (ioijo) G [rn] x [n], we can express Z'^^^^ as a sum of independent random variables as shown 
below: 

K,jo = = (1 - p~^5ij)Z,j{Vr±e^e*,e,^e*J. 

ij 

It is easy to show that the Rij^s are zero- mean random variables with variance given by 

Var(i?,,) = (1 - p)p-'\Z.A' \{Vr^e,e*,e,oe*o)\^- 



Therefore, 



^Var(i?y) = (l-/j)p-i^|Zyf |(Pp^e,e:,e,„e 



* \|2 



= {l-p)p-^Z\\UVr^e,,el\\l 
< il-p)p-'^\\Z\\l, 

where the last inequality holds with high probability. Furthermore, we have 

< p''\\Z\U{Vr^e,e*,e,oeV\ 

< p-'\\Z\U\Vr±e,e*\\F\\Vr^e,„ejo\\F 

< P''l\\Z\U 

with high probability. Thus, using Bernstein's inequality, we obtain 



[\Z',^^J > eWZU] <2exp 



e^p 



27(4 + 
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Choosing e < 1, we can reduce the above expression to 

P[|Z^J>.||Z||^]<2exp(-|!^). 

If yO > Coe~^7logm for some numerical constant Cq > 64/3, then we have 

3Co log m 



[|^UI>e||^||oo] <2exp(-: ^2 



Applying a union bound, we get 



l^'lloo >e||^||oo] < 2mnexp(-2^2^) 



(33) 



Since Cq > 64/3, we obtain the desired result. □ 
The following lemma is a restatement of Theorem 6.3 in |14| . 

Lemma 14. Let Z g E"'X" be fixed, and ~ Ber(p). Then, with high probability. 



\\Z-p-'VnZ\\<C;,^^^^^\\Z\U (34) 
provided that p > Cq ^"^"^ , where Cq > is a numerical constant. 
4.2 Proof of Lemma [3] 

Before proceeding to the actual proof, we introduce some additional notation. Let Zj = UV* — Vr^Yj, 
where Yj^s arc defined in Eqn. dH]). Evidently, Zj e for all j > 0. The recursive relation between 
the Yj 's can then be expressed as 

= {Vr± - q-^Vr^Vn,VT^)Z,-i, Z^ = UV*. (35) 

Let us assume that e G (0, e~^). From Lemma [T3l we have that 

II^jIIoo <e||^j-i|loo, 

with high probability, provided that 

9 > Coe"^7logm, (36) 
where Cq > 64/3 is a numerical constant. Since Zq = UV* , with high probability, we have 

WZjWoo < equv*u 

— ^ \/ mn ' 

The second inequality above follows from our assumptions about the matrices U and V. Furthermore, 
when Eqn. (jSH) holds, we also have, with high probability, 

\\Zj\\f < e\\Z^_,\\F (37) 

using Lemma [T2] Once again, since Zq = UV*, we deduce that 

\\z,y < e^uv*y 

= e-'y/r 

with high probability. 
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4.2.1 Bounding \\W^\\ 

We first introduce a few notions before deriving a bound on We let R denote the linear subspace 

obtained by projecting all the points in Q-^ onto T-^. By a slight abuse of notation, we denote this by 

R = Vt^Q^- (39) 

We note that if is a random p-dimensional subspace in R™^", then with probability one, i? is a 
p-dimensional subspace of T^. It is easy to verify that for any X e M™'^", we have 

Vt^X = VtX + VrX. 



30 



We note that 

=E9"'^^^.^J-i- (40) 

Thus, we have 

llVt^^ll = 

jo 
jo 

= Ell^r(g-'n2, -I)Pr^^j-i|| 

jo jo 

< E ||^r49"'n2, -I)Vr^Z,^i\\ +J2Uq'''Pn, -I)Pr^Zj-i|| • 
i=i i=i 

The second term in the above inequality can be bounded with high probability using Lemma [14] as 
follows: 



provided that 

q > max i Cq — - — , Coe~'^7 log m 



^, logm 2, 



n 



On the other hand, each term in the summation in the first term can be split as 

< WVAq'^Vn, -I)Vr±Zj^i\\ + \\VR{q-^Vn, -I)Vr±Zj^i\\ 

< 2\\{q-^Vn, -I)Vr±Z,^i\\ + \\VR(q-^Vn, -I)Zj-i\\ . 

We have already seen how the first term in the above inequality can be bounded with high probability. 
Hence, we now focus on the second term. We first state the matrix Bernstein inequality (see Theorem 
1.4 in [22]) that will enable us to derive a bound on the second term. 
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Theorem 3 (Matrix Bernstein Inequality). Let Mi , . . . ,Mk G R'*! ^''2 ]-,g independent random matrices 
satisfying 

E[Mi] = 0, \\Mi\\ < S almost surely, i = 1, . . . , fc. (41) 

We set 



a — max • 



Then, for any t > 0, we have 



i=l 



> t 



< {di + 3,2) exp - 



2ct2 + iSt 



(42) 



(43) 



Using Theorem [3l we will now show that, with high probability, 

\\PR{q~^Vn^ ~I)Zj^i\\ < CpV^logm\\Z,^i\\oo. 

The proof is as follows. 

For every G [m] x [n], let us define Mn = Hii{Zj_i)iiPiieie*i, where the HaS are independent 
random variables distributed as follows: 



Hi] 



1, w.p. 1 — q 

1 - q-^, w.p. q 



We note that ^ Mu has the same distribution as VR{I—q ^Vnj)Zj-i. Since the Has, are independent 
zero- mean random variables that are independent of Zj-i, we have that, for any (i,Z) G [m] x [n], 

¥.[Ma I = 0. 

We record two useful bounds. We have that 

l-p = P[u,{(z,/)el),}] < Jog. 

So g > (1 — p)/jo- Since \Hii \ < q^^ almost surely, and jo > C/logm, we have 

\Hii \ < O(logm) almost surely. 

We also have 



\\VRe,e*\\ < \\VRe,e^\\F < 1, 
for any (i,/) g [m] x [n]. It follows that ||Mii|| < 0(logm)||Zj_i||oo almost surely. 



(44) 

(45) 
(46) 



Now we bound the variance term. It can be shown that £[^^^7] = O(logm). Let Bi,. . . ,Bp be such 



an orthonormal basis for R. Then, we have 



Y,E[Hfi]VR[e,e*]{VR[e-,ennZ,-i) 

a 

< 0(logm)||Z,_i| 



2 



= 0(logm)||Z,_i||^ 



= 0(logm)||Z,_i||L 
<0(logm)p||Z,_i||L 



^PH[e,er](7',j[e,e 
ii 

il \s=l 



(47) 

(48) 

(49) 

(50) 
(51) 
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A similar bound holds for the other variance term E[M*;Mii]. 
Now, using the matrix Bernstein inequality, we have 

^{\\VB{q-^Vn,'I)Zj-i\\ > t\Zj^i,Q) 

< (™ + ^^)^^p(-c'iplogm||Z,_iP^ + C2logm|lZ,_i||ooi 
Therefore, removing the conditioning, we have that, with high probabihty, 

\\VR{q~'Vn^ -I)Zj^i\\ <C'V^logm||Z,_i||oo, 

for any j, for some numerical constant C > 0. 
Thuswe have that, with high probability, 

jo jo 

Y,\\VR{q-^Vn^-I)Z,^^\\ < ^CV^logm||Z,_i|U 
j=i j=i 



< C'V^logm./^fl -e)-i 
V mn 



< Clogm.J^il-e)- 
y n 

Under the assumptions of Theorem[Tl the bound on the right hand side can be made arbitrarily small. 
This gives us the desired bound. 

4.2.2 Bounding \\VniUV* + W^)\\f 

We now prove the second part of Lemma [31 First, we note that ViiVjo = by construction. Therefore, 

VniUV* + VrW = rn{UV* - Vr^Y,,) = VnZ,,. (52) 

Consequently, we have 

WVniUV* +VtY,,)\\f = WVnZjW 

< \\ZJ\f 

< 4. 

The last step follows from the fact that e < e^^ and jo > 21ogm. 

4.2.3 Bounding II 7'j2i(L/F* + 1^^)11 oo 

We now prove the final part of Lemma [3] We note that 

UV* ^ UV* + rrYj„ = Yj„ + % . (53) 

Since we have already proved that < A/8, it is sufficient to show that ||Y,f,||oo < A/8. We have 

jo 

r.olloo = \\j2q-'rn,z,.,\\^ 

j=l 

jo 

< g-^^||n2,Z,-i||oo 
j=l 

jo 

j=i 

jo 

< 



JO I 

^ — ' V rnn 

j=i * 
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Notice that g > (1 - p)/jo > 4/ logm for p < 1/2, 

log m rjir 



- 4(l-e)Vmn 
< A/8, 

for sufficiently small e and for some numerical constant Cr- 
4.3 Proof of Lemma [4] 

We recall the notation that = © T. By Lemma [TT| we have that 

1"^^^^^^"" 1 - WVq.VtW ■ 

Let us assume that m, n are sufhciently large so that the following conditions hold true: 

mn 4 



p , J{m + n)r\ ^ 1 ^^^^ 



mn V mn I 2 

2n ^ ^ fir logm 
P(l-p) > Co , (56) 



where Co > is the numerical constant from Lemma HI We also assume that p < 1/5. We note that it 
is possible to satisfy all of the above inequalities under the assumptions on p and r given in Theorem [1] 
and because p is a fixed constant in the interval (0, 1). Using Lemma llll it is easy to verify that under 
these assumptions, with high probability, we have that 

\\Vr^Vn\\<vVP^ (57) 

where 77 > is a numerical constant. 

The basic steps of the proof closely follow that of Lemma 2.9 in [B]. We recognize that using the 
convergent Neumann series, can be expressed as follows: 

= A(I - Vr^)rn Y,i'PnVr^Vn)''[sgiiiSo)]. (58) 

k>0 

As mentioned in Section FOl we assume that the signs of the non-zero entries of Sq are independent, 
symmetric ±1 random variables. 



4.3.1 Bounding \\W^\\ 
It is easy to show that 

k>0 

■- wf - wi 

We now show that each of these components have spectral norm smaller than 1/16 with high probability. 
This gives us the desired bound on ||Vt^"^||. 

For the first term, we can use standard arguments about the norms of random matrices with i.i.d. 
entries (see [23]) to show that, with high probability, 

||sgn(5o)|l < 4^^. 
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Since A = m ^1"^ , we have that |lWj^|| < 4^/5 with high probabihty. Thus, for sufficiently smah p, we 
have that ||W^f || < 1/16. 

We use a discretization argument to bound |jM^2^||. Let and A^„ be 1/2-ncts for the unit spheres 
in and M", respectively. It can be shown that the sizes of Nm and A^„ are at most 6™ and 6", 
respectively (see Theorem 4.16 in ^2]). Then, we have that 

\\W^\\ < 4 max x*W^y 

= 4 max (icy*,W^) 

= 4 max (xy*,A7'o^^r^n2y(^n7'r^n2)''[sgn(5o)] ) 

= 4A max ( V(n2Pr^7'o)''n2Pr^7'o4xy*], sgnC^o) ) 

\fc>0 / 

= 4A max (iJ(x, y), sgn(5o)) . 
For any (x, y) G x A'jj, we bound j|i7(x, y)||i? as follows: 



l^(x,y)||F 



Y,{VnVr^VnfVnVr^Vn^ [xy* 



< II (^^J^r^T'nj'TaPr^T'o^ [xy* 



fc>0 



l-^oT'r.p- 

Conditioned on Q and il, we use Hoeffding's inequality to get 

P[|(i?(x,y),sgn(5o))| > t\n,Q] < 2exp ( -- 
Subsequently, using a union bound over N„i x Nn, we obtain 



2t' 



max |(if(x,y),sgn(5'o))| > t|17,g 



< 2 • 6"'+" • exp 

< 2 • 6™+" • exp 



2^2 



maXxew,„,yGAr„ ||-ff(x,y)|||. 



\VnTTi 



(59) 
(60) 
(61) 



Let El be the event {H'PfjT'r^ II ^ V^fp}- We know that this event occurs with high probability. Thus, 
removing the conditioning on VL and Q, we have 



max |(i7(x,y),sgn(S'o))| > t 



< 2 -6"+" -exp 
+F[E1]. 



2t\l - rj'pf 
rfp 
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Therefore, 



4A max |(i7(x, y), sgn(5o)) | > t 



< 2 • 6™+" • exp 



t^{l - rf'pf 



Setting t = ^i^^ip and substituting A = l/\/m, we get 

< 2 • exp (2TO(log6 - s^)) + Y[El]. 



WwiW > ^'^^ 



Let us choose any s > \/\og6. Then, for sufficiently small p, we have that |jVF2^|| < 1/16 with high 
probability. 

4.3.2 Bounding WVn^W^Woo 

Once again, using the convergent Neumann series expansion for W^, we have 



max 



A max 



,e*, A(I - rr±)rn ^(n2^r^^o)'[sgn(5o)] 



fe>0 



{Y,{VnVr±Vn)''VnVr± [e,e*], sgn(5o 



\fc>0 



= A max |(Xi.j,sgn(S'o))| . 
Conditioned on Q and ft, we use Hoeffding's inequality to get 

P[|(X,-,,sgn(5o))| >t|r!,Q] <2cxpf- 



2^^ 



Using a union bound, we obtain 

P max|(X,.j,sgn(5o))| > 
We obtain a bound on ||Xij j|i? as follows: 



< 2TOnexp 



2t' 



maxij 



k>0 



fc>0 



,k>a 



< 



\W^^\\\\V^4e^eJ]\\F 
l-\\VnVr^r ■ 



Thus, we get 



max|(Xi.j,sgn(S'o))| > t\n,Q 



< 



imncxp 1^ ii^^p^^p^^^,^^^. \\V^4e,e*]\\l ) ' 
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Removing the conditioning on Q and 17, we get 

s /slog(mn) \\VnVrA\ max^j ||Pr-L [e,e*] ||f 

Consider the two events: 



l-\\VnVrA? 



< 2{mnf 



El {\\VnVrA\<'n^P}, 

E2 ■■= |max||7'r^e,e*||F< V7|, 

where we recall that 7 = 4 ^ Tt) ■ have already shown that i?i and E2 occur with high 

probability. Substituting for the various bounds and setting s = 2, we get 



Under the conditions of Theorem [T] and for sufficiently large m,n and sufficiently small p, we get that 
ll^n-LW^"^||oo < A/8 with high probability. 

4.4 Proof of Lemma [5] 
4.4.1 Bounding \\W^\\ 

Using the convergent Neumann series expansion, we can write the analytical expression for ^s follows: 

= Vn^ Y.^VQ^VnVQ^f{VQ^{^UV*)), (62) 

A:>0 

where we recall that li = Q, ® T . It follows that 



\\W^\\f< 



fc>0 



\Vq.{UV*)\\f. 



Considering the first term of the product on the right hand side, 



fc>0 



A;>0 



< 



2k 



k>0 



From Lemma fTT| we have that, for any e > 0, with high probability, 

64 

< 



1- v^P+i 



{m + n)r 



Assume that p < 1/4, and fix e = 3p. For m,n large enough, we can assume that max{p/mn , r(m 
n)/mn} < p. Then, we have that, with high probability. 



,^ ^ 1,2 832fl 
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Therefore, for sufHeiently small p, we have that 



(63) 



with high probability. Consequently, 



k>0 



4 

^3' 



(64) 



with high probability. 

We bound \\'Pq±{UV*)\\f as follows. As explained earlier, suppose we veetorize all matrices, then 
pQ±- has the same distribution as H{H* H)~^ H* , where H E jjmnxp jg ^ random Gaussian matrix with 
i.i.d. entries ^ A/'(0, l/mn). Therefore, we have 

\\Vq±{UV*)\\f = \\H{H* H)-^ H*yec{UV*)\\2 < \\H{H*H)-^\\ ||iI*vec([/V^*)||2, 

where the above equality is in distribution. We have already shown in the proof of Lemma [7] that 

P [\\H{H*H)-^\\ > 4] < e-"'"/32^ 

We note that H*vec{UV*) is a p-dimensional vector whose components are i.i.d. and have the same 
distribution as {G,UV*), where G G K™x" jg a random Gaussian matrix whose entries are i.i.d. ^ 
Af{Q,l/mn). It is easy to see that {G,UV*) is distributed according to A/'(0, r/mn), and therefore, we 
have 

E[\\H*veciUV*)\\F] < m\H*veciUV*)\\l]f/^ = ,/^. 

V mn 



Since || • ||f is a 1-Lipschitz function, we use Proposition 2.18 in 



to get 



\\H*vcc{UV*)\\f > ^i\\H*vcc{UV*)\\F) + t 



Setting t = -\/6 logm, we get 



\H*Nec{UV*)\\F > 




6r log m \ ^ 1 



(65) 



Putting it all together, we conclude that 
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IIW^^II < \\W^\\f < ^ ( \l — 
3 \ V mn 



6r log m 



(66) 



with high probability. Clearly, for sufficiently large m, the right hand side can be made arbitrarily small 
under the conditions of Theorem [1] and hence, we have the desired bound. 

4.4.2 Controlling \\Vn±W'^\\oc, 

It is easy to show that the analytical expression for W'^ can be written slightly differently as follows: 

= Vu^Vq^ Y.^VQ.VnVQ.)\VQ.{^UV*)). (67) 

fe>0 
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Consider any € [m] x [n]. Then, 



\A:>0 



< 



< 



k>0 



k>0 



\\VQ^iUV*)\\F\\VQ.rn^e.e*\U 



We have ah'eady derived bounds for the first two terms. For the final term in the product, we use the 
same technique we employed to bound \\Vq±{UV*)\\f- Using the fact that llPn^^ie* ||f < 1, we can 
show that 



VQ±Vn^eie*\\F > ^ \ \l 



61ogm 



< |_ g-mri/32 



Using a union bound, we get 



max \\VQ±VYi^eie*\\F > — 



16 

y 



6 log m 



< mn(m 



-mn/32 



Putting all the bounds together, we have that, with high probability, 



9 mn \ 



(68) 



Since A = m it easy to show that under the assumptions of Theorem [TJ the right hand side in the 
above inequality can be made smaller than C A, for any fixed C > 0. Thus, we have the desired bound. 



5 Deterministic Reduction: Proof of Theorem [2] 

In this section, we provide the proof for Theorem [5] under the deterministic subspacc model for Q^. We 
will adopt the same optimality conditions established in Lemma [2l and the same proof strategy outlined 
in Section [3l namely the construction of = + + . To avoid redundancy, wherever possible, 
we will only highlight the parts that differ from the previous proof in Section S] and refer the interested 
reader to Section 2] for more details. First, we derive the various incoherence relations associated with 
our fixed subspace . Then, we will prove Lemmas |31 HI and [5] using these relations. 

5.1 Preliminaries 

In this subsection, we provide several lemmas that will be used later in our proof. 
Lemma 15. If X e ]|j™xn jg a rank-r matrix, then 

\\Vq.X\\1<j.?^\\X\\1. (69) 
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Proof. 



i=l 

< p(max||Gj2) j|x| 



2 



< pr (max||G,||2) 



□ 

Corollary 1. For any :^-coherent subspace Q-^, we have the following: 

1. \\VQ.e,e*\\l < i.^; 

2. WTQ^Vrr < ; 

3. ||PQ4C/y*)|l|<2z.H^. 

Proof. The first two results follow from the fact that the e^e* are rank-1 matrices, and rank(PT-'f) < 
2r\/X G R™^". The last result can be derived from the second one as shown below: 

wrQ^mnwi < \\rQ.VTr\\uv*rF < 2^^. 

n 

□ 

Lemma 16. Under the assumptions made in Theorem [2l we have that 

WVQ.VnW < 1/2, (70) 

with high probability, provided that p < pq and v^p^logm/n < C. Here, G > and po G (0,1) are 
numerical constants. 



Proof. Please refer to Section 15.51 for a detailed proof. □ 



5.2 Proof of Lemma [3] (deterministic case) 

We use the same framework from Section [4. 2. II to bound the corresponding norms of M^^. We note that 
to bound II W^'^^ll in the previous case, the only key property of Q-^ that was critical to the proof was that 
r-'- = T is 0(/ir/n)-constrained. More specifically, the latter property is used in Lemma fT2l and 
Lemma 1131 

In the deterministic case, by assumption, Q-^ is z/-coherent, where i/ is a constant. In the following 
lemma, we will show that is 0(/ir/n)-constrained as well under our assumptions. We will show that, 
the proof of Lemma |3] can be directly adopted for the deterministic case from the that with the random 
subspace model. 

Lemma 17. If is iz-coherent, then 

||Pr.e-.e;||.<4(yf +/^). (71) 



In other words, if is i/-coherent, then F^ is 7-constrained for 7 = 16 ( ^Jv^pjn + ^J2pr/n 
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Proof. Let us assume that ||'Pq-l'Pt|| < 1/2. This is true for sufficiently large n under the assumptions 
of Theorem [2j Using the convergent Neumann series expansion, it is possible to show that 

Vr^e.e* = [{I - Vq^VtY^Vq^Vt^ + {I - VtVq^)-'VtVq) (e,e*), 

and therefore, 

||^r^e,e*||f < ||I - Pq^T't)"' || HT'q^^t^ (e,e*)||j^ + HI - 7't7'q^)"'|| II WQ(e,;e*)||f. 
From Eqn. ([5]) and Corollary [TJ we have 

\\VQ.VT^{e,e*)\\p < \\VQ^{e.e*)\\F + \\VQ.VT{e^e*)\\F 
< II^Q-(e,e*)||F + ||7'T(e,e*)||F 



< 



ijir 

n 



Similarly, we have 



We also have that 



lyp / 2fir 



\\VTVQ{e.e*)y<J^ + d^ 



k>0 



< 2. 



Hence, we have 



\Vr^e,e*\\F < 4 



n y n 



(72) 



□ 



It can be easily shown that the results in Section 14.11 all hold for the deterministic case as well with 
the modified value for 7 derived above. Consequently, the proof of Lemma [3] from Section 14.21 can be 
directly adopted for the deterministic case as well. 

5.3 Proof of Lemma [4] (deterministic case) 

We now provide a proof of Lemma |4] under our deterministic subspace model. Since the basic framework 
of the proof is very similar to that in Section 14. 3| we will derive only the important steps here and refer 
the interested reader to Section |4?3l for more details. 



Controlling H^n^ W^'^lloo Using the convergent Neumann scries, we have 



fc>0 



Therefore, we have 



A max 



e,e*, {I - rr±)Vn Y.^VnVr^Vnf [sgn(5o)] 



k>0 



A max 



Y^iVnVT^VnfVnVr^ (e,e*), sgn(5o) 

fe>0 

A max |/i/(*'^\sgn(S'o)\ . 
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We now bound \\H'-^-^^\f as follows: 



l-\\VnVr±\\^ 
Conditioned on fl, using HoefMing's inequality, we have 



fc>0 

\k>a J 
^ \\VnVrA\\\Vr^{e,e*)\\F 



(i7(''^"),sgn(5o)> 
Applying a union bound, we get 



< 2 exp - 



max 

i,3 



i7(^-'"),sgn(5o)> 



>t\n 



>t\n 



< 2mnexp — 



2t' 



2t' 



< 2mn exp 



2t^l-\\VnVrA 



l|7'nPr^||2max,j||7'rx(e,e*)|||, 



Removing the conditioning on J7, we get 

\Vn^W^\\oo>X 



slog(mn) WPnVr^W maxij \\Vr±{eie*)\\F 
2 l-WVnVr^P 



< 2(rnn) 



l-s 



where s > 0. Consider the event E ^ { WVuV^^ \\ < Vy/p}- Just like under the random subspace model, it 
is not difficult to show that the event E occurs with high probability for some fixed rj > 0. Furthermore, 
we have already shown that T-^ is a 7-constrained subspace with 7logm = 0(1/ log m). Setting s = 2, 
we get 



1 - ri-p 



< 



mn 



Thus, we have the desired bound. 

Controlling HVK"^!! The proof is identical to the one in Section [4.3.11 
5.4 Proof of Lemma [5] (deterministic case) 

We now prove Lemma [5] under our deterministic subspace model. Once again, the basic structure of the 
proof is very similar to the one used in Section 14.41 So, we only provide the relevant bounds here and 
refer the interested reader to Section |4^ for the detailed steps involved. 



where we recall that li — fKBT. From 



Controlling The proof framework is the same as the one in Section [4.4.11 We note that the 

key step is to bound \\Vq±{UV*)\\f and Y.k>oi'^Q^'^n'PQ±)'^ 
Corollary [ll we already know that 



2vpr^ 
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For the other quantity, we have that 



fc>0 



< 



1 



1 - \\Vo-Vn\ 



By Lemma [TTl we have 



2^ ^^'PQ-'Pn\? + \\'PQ-'PTr 



l-\\VnVT\\ 

From Lemma [9l we know that HT-'jiT'tII < ^/p + e with high probabihty, provided that 

(1 - p) > Co • e . 

n 

Suppose that the above condition holds with e = p, and assume that 

2i'pr 
n 

We note that both the assumptions above can be true for sufficiently large m and n under the assumptions 
of Theorem [21 Under these assumptions, along with Lemma [T6l we have 



1/4 + P 



with high probability. Thus, we have that UPg^'Pnll^ < 1/2 with high probability, provided that p is 
sufficiently small. Putting all these bounds together, we get 



with high probability. Under the assumptions of Theorem the right hand side can be made arbitrarily 
small, and hence, we have the desired result. 

Controlling \\Vn^W^\\oo Once again, the proof framework is identical to that used in Section 14.4.21 
The key step here is to bound maxj-j j-jg^c \\'PQ±-Vii±eie*\\F- We first use the Neumann series to rewrite 



Vu^ie* as 



rne^e* = {{I - VnVTy^VnVT^ + {I - VrVny^VrVni-) (e,e*). 



Now, for any G [m] x [n], we have 



\VnVT^e,e*\\F = \\VnVTe,e*\\F < 



2pr 



2pr 



\\VTV,-^^e^e*\\F = \\VTe^e*\\F < 

Furthermore, by the assumption we used earlier (to bound we have that HT-'j^T'tII < \/2p with 

high probability. Therefore, we have 



\{X-VnVTy 



< 



fc>0 

1 



1- V2p 



< 2 
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with high probabihty, provided that p < 1/8. Thus, we get 



rne-,e;|l^<y^, 

with high probabihty. Consequently, for any € 51*^, we have 

\\VQ^Vn^e,e*\\F < \\VQ^e,e*\\F + \\rQ^r^e^e*\\F 



n 



with high probability. 

Proceeding along the same lines as in Section |4. 4. 2[ we have that 



with high probability, for any G Sl'^. Therefore, we have that 



with high probability. Since A = m^^^^, under the assumptions made in Theorem [2J the right hand side 
can be made smaller than A/8, provided that n is sufficiently large. 

5.5 Proof of Lemma [T6l 

Consider the linear operator 

It can be easily shown that 

E [A] = 0. 

First, we derive a bound for the spectral norm of A. Let 5ij be a sequence of independent Bernoulli 
random variables such that 

1, ii{t,j)en, 

0, otherwise. 



S^j 



Then, we can rewrite A as 

A ~ Aij , 

where 

- ^vT^Q^ (e»e*) ® Vq^ (e.e*) - -^Vq^ , 
and ® denotes the outer or tensor product between matrices. Then, we have that 

II A, II < \\VQ^ie,e*)^rQ^ie,e*)\\ + (73) 

<\\VQ.{e.e*)\\l + £- (74) 

<^ + +^ (75) 
n mn 

= S, (76) 
where in Eqn. jHl) we used the fact that ||A(8)_B|| < ||yl||_F||-B||_F. 
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We now bound the variance terms. 



We let Vn^- denote the orthogonal projector onto the subspace span(eie*). Clearly, we have 



Furthermore, we note that 



= E ■'^"'i ■ 
(i,i)en 

Vq^ (e,e*) ® Vq^ (e,e*) - T'q^T'o.^T'q^ ■ 



Thus, we get 



E 



mn 



mn 



Similarly, we have 



Let X e M™''" be any matrix satisfying \\X\\f = 1. Then, 



p 



i,i \A:=1 y 

/ P N 



(77) 
(78) 



r 

where wc recall that the G^'s constitute an orthonormal basis for Q-^ satisfying max^ ||Gi|p < i^/n. We 
now bound llPnij GfeHi? as follows: 

\\Vn.,G,\\F = \{e,e*,rQ.G,)\ 

< ||Gfc||||7'Q^e,e*|U 

< \\Gk\\V^\\VQ^e,e*\\F 

ru / 

< 

V ' 



V vp 
— \jn W — 
n y n 
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Combining the above bound with Holder's inequahty, we get 



^,3 



< 



< V 



v_ 

n 



\k=l 



Therefore, we have that the variance in Eqn. ([75]) can be bounded as 

< Pl^\ 1 1 

V n mn mn 



V n 

Applying the matrix Bernstein inequality (Theorem [3]), we get 



\\A\\ > t] < 2m^cxp 



2cr2 + 35-^ 



< 2m cxp 



Let us set t = p. Now, suppose that 



V p log m 



(79) 

(80) 
(81) 



(82) 



where C3 > is a numerical constant. Then, under the conditions of Theorem [2l ||^|| is bounded from 
above by p with high probability. Since A = Vq^V^Pq^ — P'Pq^i tl^is implies that 



with high probability. It follows that 



\VQ^VnVQ^\\<2p, 



VQ^VnW < 1/2, 



(83) 



(84) 



with high probability, provided that p is sufficiently small and v^p^ logm/n < C, where C is a numerical 
constant. 
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